NSF PAR Search | NSF Public Access Repository

Lessons from assembling UCEs : A comparison of common methods and the case of Clavinomia (Halictidae)

https://doi.org/10.1111/1755-0998.13925

Bossert, Silas; Pauly, Alain; Danforth, Bryan_N; Orr, Michael_C; Murray, Elizabeth_A (January 2024, Molecular Ecology Resources)

Abstract Sequence data assembly is a foundational step in high‐throughput sequencing, with untold consequences for downstream analyses. Despite this, few studies have interrogated the many methods for assembling phylogenomic UCE data for their comparative efficacy, or for how outputs may be impacted. We study this by comparing the most commonly used assembly methods for UCEs in the under‐studied bee lineage Nomiinae and a representative sampling of relatives. Data for 63 UCE‐only and 75 mixed taxa were assembled with five methods, including ABySS, HybPiper, SPAdes, Trinity and Velvet, and then benchmarked for their relative performance in terms of locus capture parameters and phylogenetic reconstruction. Unexpectedly, Trinity and Velvet trailed the other methods in terms of locus capture and DNA matrix density, whereas SPAdes performed favourably in most assessed metrics. In comparison with SPAdes, the guided‐assembly approach HybPiper generally recovered the highest quality loci but in lower numbers. Based on our results, we formally moveClavinomiato Dieunomiini and renderEpinomiaonce more a subgenus ofDieunomia. We strongly advise that future studies more closely examine the influence of assembly approach on their results, or, minimally, use better‐performing assembly methods such as SPAdes or HybPiper. In this way, we can move forward with phylogenomic studies in a more standardized, comparable manner.

Abstract Species occurrence data are foundational for research, conservation, and science communication, but the limited availability and accessibility of reliable data represents a major obstacle, particularly for insects, which face mounting pressures. We presentBeeBDC, a newRpackage, and a global bee occurrence dataset to address this issue. We combined >18.3 million bee occurrence records from multiple public repositories (GBIF, SCAN, iDigBio, USGS, ALA) and smaller datasets, then standardised, flagged, deduplicated, and cleaned the data using the reproducibleBeeBDC R-workflow. Specifically, we harmonised species names (following established global taxonomy), country names, and collection dates and, we added record-level flags for a series of potential quality issues. These data are provided in two formats, “cleaned” and “flagged-but-uncleaned”. TheBeeBDCpackage with online documentation provides end users the ability to modify filtering parameters to address their research questions. By publishing reproducibleRworkflows and globally cleaned datasets, we can increase the accessibility and reliability of downstream analyses. This workflow can be implemented for other taxa to support research and conservation.

Search for: All records